System and method for internationalizing the content of markup documents in a computer system

ABSTRACT

The present invention concerns a method for internationalizing the content of markup documents ( 8 ), which consists of:  
     detecting a tag dedicated to the localization of the document ( 8 ) by means of a tool ( 11 ) and one or more localization attributes of an element to be localized designated by said tag;  
     searching for the localization attributes in a translation file ( 10 ), and for the localized value of the element associated with this or these localization attributes;  
     replacing the tag in the document ( 8 ) with the localized value found in the translation file ( 10 ).  
     The present invention also relates to the system for implementing said method.

[0001] The present invention concerns a method for internationalizing the content of markup documents in a computer system, and more particularly the content of pages on the web (more commonly called web pages in computer literature), as well as a system for implementing this method.

PRIOR ART

[0002] The present invention relates to the internationalization of markup documents.

[0003] The term documents is intended in the broad sense, i.e., a text, a sound extract, a video document, a program, or any other type of information medium or combinations of such mediums.

[0004] A markup document is a document that includes tags or markers (both terms are used in computer literature), i.e., special codes that control, in particular, the structure and/or the appearance of the documents in the software using them.

[0005] The present application describes the example of a markup page on the Web, i.e. a computer document such as, for example, a text file, an image, or a video into which have been inserted special codes (the tags) that control the structure, the appearance, the dynamic behavior, etc., of the page in the software for navigating on the Web (commonly called a Web browser in computer literature). A Web browser is a piece of software used to present a document to a user, and to keep track of the relationships established between this document and other documents by means of links on the Web (commonly called Web links in computer literature). A Web link is a reference that makes it possible to design an access protocol, a host system, an access path in this system, and possibly an anchor, thus making it possible to access a document or one of its parts.

[0006] Today, in the majority of cases, Web pages are created using markup languages. The most commonly used language is HTML (Hyper Text Markup Language). Other languages are beginning to be used, such as XML (extended Markup Language), but they are essentially very similar to HTML.

[0007] The internationalization of a document consists of allowing and facilitating the localization of said document into a given language or culture. The localization of a document is the procedure that consists of implementing means for transcribing said document into a given language or culture. Internationalization concerns, for example, the translation of text, sound and/or video messages, etc., the transformation of typed elementary data (dates, numbers, monetary values, etc.), concept representation (representation of an icon of the “DANGER panel” type in the routing code), information sorting (information sequencing), encoding (digital translation of a piece of information into a given format), and the manipulation of information (the manipulation of character sets): concatenation operations, capitalization, etc.), etc.

[0008] It is important to note what affects the presentation of a document has more to do with its personalization than its localization. For example, the choice of colors, the character fonts or character sizes, the layout of the paragraphs, etc., is generally not part of the internationalization/localization. On the other hand, certain aspects of the rendition of a document, like accommodating the direction in which texts are read, which creates problems in framing, positioning action buttons, etc., are internationalization problems.

[0009] In the case of software internationalization, localization is made necessary by the expansion of markets (increasing foreign sales), by client or even legislative requirements for using software and documents in one's native language, and by constraints related to integration, maintenance, confidentiality or patrimonial protection. Moreover, software designers do not want to handle the dissemination of the sources of their software, explain to third parties the places in which messages must be modified, provide support for the errors resulting from these modifications, reveal trade secrets, etc. the localization must avoid the recompilation or delivery of sources.

[0010] Nowadays, there is no solution that handles the internationalization of the content of Web pages. In general, Web page providers simply duplicate the entire page and completely replace the content to be localized, in general manually. Linguistic/cultural experts are required to know the formatting language of the documents, for example the HTML language, and its subtleties, or use HTML page editors. In any case, they are required to have the pages in their entirety, and hence all of the HTML elements, in order to be able to work on them.

[0011] One problem posed by the invention is for a software editor to be able to internationalize computer (software or other) documents or to offer his clients Web pages that can be internationalized while avoiding any client involvement in the localization process. Localization must avoid the need to translate the document, and for example all of the pages on the Web, into all of the languages.

[0012] Another problem is the growing complexity of the HTML language, and of structuring and formatting languages in general, the sophistication of Web page content (the content of the pages is becoming increasingly rich, with a growing number of presentation gimmicks), and the use of advanced (particularly HTML) editors that require more and more expertise on the part of translators.

[0013] One object of the present invention consists of allowing markup documents to be localized without any user intervention.

[0014] Another object consists of modifying the editors to allow the internationalization of markup pages.

[0015] Another object of the present invention consists of facilitating localization operations in a computer system, in particular by avoiding the need to recompile it or to deliver its sources.

SUMMARY OF THE INVENTION

[0016] In this context, the subject of the present invention is a method for internationalizing the content of markup documents, which consists of:

[0017] detecting a tag to be used in the localization of the document, one or more localization attributes, and possibly a default localization value associated with said tag by means of a localization tool;

[0018] searching, if necessary, in storage means in a translation file, for the localized value of the element associated with this or these localization attribute(s);

[0019] replacing the tag in the document with the localized value found in the translation file, or with the default localization value, or with a value obtained via automatic transcription functions.

[0020] The present invention also concerns a system for internationalizing the content of markup documents, comprising:

[0021] means for storing markup documents;

[0022] means for storing translation files for the documents;

[0023] a localization tool connected to said storage means and allowing the content of the document to be localized using the translation file.

[0024] The present invention also relates to a method for editing and internationalizing markup documents that consists, each time during the editing of the document (8) that a user enters content to be internationalized, of associating localization attributes with said content, proposing the entry of a default value of the content to be internationalized, and proposing the entry of all or some of the various values assumed by this content in the various target languages of the document being edited, of creating the document and the associated translation files from information obtained from the user, and storing said files in storage means.

[0025] The present invention concerns an editing and internationalization system comprising an editor in a machine for editing markup documents, which makes it possible to create reference files and associated translation files from information obtained from the user and store them in storage means.

PRESENTATION OF THE FIGURES

[0026] Other characteristics and advantages of the invention will become clear in light of the following description, given as an illustrative and non-limiting example of the present invention, in reference to the attached drawings in which:

[0027]FIG. 1 is a schematic view of an embodiment of the internationalization system according to the invention;

[0028]FIG. 2 is a schematic view of an embodiment of an editor that allows the internationalization according to the invention;

DESCRIPTION OF AN EMBODIMENT OF THE INVENTION

[0029] As shown in FIG. 1, which illustrates an embodiment of the internationalization system according to the invention, a computer system 1 is distributed and composed of machines 2-4 organized into one or more networks 5. A machine is a very large conceptual unit that includes both hardware and software. The machines can be quite diverse, such as workstations, servers, routers, specialized machines and gateways between networks. A machine comprises at least one processor, at least one memory, and possibly one or more peripherals. Only the components of the machines of the system 1 that are characteristic of the present invention will be described, the other components being known to one skilled in the art.

[0030] It should be noted that the machines 2-4 can be grouped with one another in various ways and can, for example, constitute one and the same machine.

[0031]FIG. 1 represents an exemplary embodiment of the internationalization system according to the invention.

[0032] The internationalization system according to FIG. 1 comprises a reference machine 2, a translation machine 3, and a localization machine 4.

[0033] The reference machine 2, in the example illustrated in FIG. 2, is connected to a network 6 of machines, such as for example the Internet. The reference machine 2 in the embodiment illustrated in FIG. 1 is a server for accessing the network 6. Pages, HTML pages in the example illustrated, are hosted by the server 2 and are retrieved by means of a file transfer protocol or by means of a Web page server interrogation protocol called HTTP (Hyper Text Transport Protocol). The Web browsers present in machines of the network 6 implement this protocol (and several others as well) and download the Web pages and the associated files from one server to another.

[0034] The reference machine 2 contains means 7 for storing documents 8, reference files 8 in the example. The reference files 8 contain the information to be localized, expressed in a “pivot” language. The pivot language is the language capable of being the most widely known among translators: its utilization makes it possible to facilitate translations and avoid indirect translations (translation from German to English, then from English to Spanish, instead of a direct translation from English to Spanish if English is chosen as the pivot language). The translation machine 3 includes means 9 for storing translation files 10. The storage means 7 and 9 can be in any format, for example in the form of a hard disk or any other type of memory.

[0035] The localization machine 4 contains a localization tool 11 in the form of a software module. The localization tool can use means 12 for storing the correspondence between the type of the document 8 and the markup language used, tags of said markup language and its grammar and syntax, as well as automatic transcription functions. The storage means 12 are contained in the localization machine 4 or linked to the latter. In the embodiment illustrated in FIG. 2, it is in the form of a hard disk. The localization machine 4 makes it possible to create localized files 13 from the reference file 8 and from the translation file 10.

[0036] As shown in FIG. 2, the present invention also concerns a Web page editor 14, for example an HTML page editor. The editor 14 is a software module contained in an editing machine 15 and offers any user facilities for writing a Web page. The editor 14 is connected to storage means 16 such as, for example, a hard disk. The editor is connected to a reference machine 2, itself connected to a network 6 of machines, such as for example the Internet. The reference machine 2, as in the embodiment illustrated in FIG. 1, is a server 2 for accessing the network 6.

[0037] The system 1 according to the present invention works in the following way.

[0038] The system 1 localizes the content of documents 8; in the internationalization system illustrated in FIG. 1, the documents 8 are the reference files 8 that contain the elements to be localized.

[0039] The internationalization method according to the invention comprises a step for identifying the type of document 8 to be localized by means of the localization tool 11. The type of the document, and of the reference file in the example illustrated, can be designated, as desired, by using the file name (and more precisely, its extension), a magic number stored in the file header, or a reference to a document that makes it possible to define the format of the document (like, for example, the DTDs, “Document Type Definitions,” of the XML language). Thus, in the example of Annex 1, the type of the document is contained in the extension of the reference file 8, for example in the form of a suffix “.html” or “.htm”. Then, the tags <HTML>. . . </HTML>, which are characteristic of a Web document constructed with the HTML language, are retrieved.

[0040] Depending on the document type, the localization tool selects the markup language to be used to read the document and detect the localization tags, as seen above. The correspondences between the file extensions, for example, and the markup languages to be used, are contained in the storage means 12.

[0041] It is important to note that the tools that make it possible to enrich the markup pages with localization attributes, and the software programs that use them, the localization tool in the example illustrated, must use the same conventions for representing localization tags, and the same semantics associated with the various attributes of these tags. The tools for editing markup pages, and the software for interpreting these pages must be able to be configured so as to support different forms of the same tag, as long as this form is unambiguous in the markup language used in the internationalized document. In the embodiments illustrated, the localization tool must be able to recognize the localization tag chosen for the language used.

[0042] The method includes a step for identifying each element to be localized in the document 8 in question, i.e., in the reference file 8, by the localization attributes. The localization attributes include at least one type, which may be a default type and hence absent from the tag. They can also include, for example, an identifier, parameters, and specific attributes of the type, as will be seen below. For example, the message “My small business” to be localized into various languages is identified by the unique identifier “1”; it is typed by the type TEXT (the element to be localized is a text).

[0043] The method according to the invention is based on the definition of tags designed to mark the identified elements to be localized using the localization attributes.

[0044] The method, by means of the localization tool 11 contained in the localization machine 4, consists of detecting in the reference file 8 a tag dedicated to localization, retrieving the localization attribute or attributes associated with said tags, searching in the storage means 9 for the translation file 10 corresponding to the target language or culture, searching in the selected file 10 for said localization attributes and the localization values associated with the unique localization attributes obtained in the reference file 8, and replacing the tags of the reference file 8 with the corresponding localization values provided by the translation file 10.

[0045] In the embodiment illustrated in FIG. 1, the method consists of localizing a Web page 8 contained in the Web page server 2.

[0046] The tags dedicated to localization use the syntax and the grammar of the tags of the HTML/XML language.

[0047] A summary of the characteristics of the HTML language, given below, will facilitate the understanding of the embodiment illustrated.

[0048] HTML is a markup language. A tag indicates to a Web browser which elements represent text, headers, links, images, or any other element that may be present on a Web page. The tags have a constant form of the following type: a “<” character, a name, possible parameters (provided in the form parameter-name “=” value-of-the-parameter), and at the end, a “>” character. The HTML language uses, for example, the following tags: <HTML>,<HEAD>, <TITLE>,<BODY>. Web browsers interpret the name contained between the “greater than” and “lesser than” symbols: for example, the name HEAD indicates that the text contained between the HEAD tags is a window title; the browser displays the text in the title bar at the top of the screen of the machine in question.

[0049] In general, tags work in pairs, but this is not always the case, as for example with the tag <P>, which alone indicates the start of a paragraph. For tags that work in pairs, the second tag differs from the first one by the character “/” in the second position. For example: <HTML> . . . </HTML>. The purpose of HTML tags is to formalize some of the aspects linked to the presentation and the structuring of the document, and to separate them from the content, the general objective being to have the same page content with different presentations; the presentations differ so as to adapt to the specific machine characteristics (monochrome or color screens), size of the screen, etc.) to the user's preferences (some of which impose the fonts and character sizes to be used for the titles, the texts, the code extracts, etc.), etc.

[0050] Annex 1 shows an example of an HTML document. As the example shows, HTML handles the “special” characters with particular keywords (like the “e with a circumflex accent” in the example illustrated, with the keyword “&ecirc;”

[0051] In the HTML language, there are tags for declaring links to other pages on the Internet, inclusions of images, video, etc.

[0052] The HTML language is the subject of several standardization documents (essentially IETF and W3C). Providers of Web browser implementation accommodate these specifications, but add specific characteristics to them, in order to offer users more services and more capabilities for customizing Internet documents. This results in an incompatibility in representation from one browser to another. The following principle was therefore adopted: when a tag in an HTML document is not recognized by the browser, it is simply ignored and nothing is displayed.

[0053] The example illustrated is based on pages written in HTML for essentially two reasons:

[0054] the presence of tags that make it possible to isolate the internationalization information from the rest of the information, in particular display information and content,

[0055] the behavior of browsers faced with unknown tags offering debugging and delivery facilities, the reference file being able to be used as a standard provision for the pivot language of the application (the pivot language being the one with which the application works by default).

[0056] The present invention can be applied to documents other than Web pages written in HTML, if said documents are formalized with a markup language and the syntax of same is known.

[0057] The method according to the invention includes a step for defining tags. In the example illustrated, HTML/XML tags are chosen to be dedicated to the localization of markup/web page content.

[0058] For example, the following tags are used:

[0059] for text messages:

[0060] <LOC ID=message-identifier [TYPE=TEXT]>Default text (optional) expressed in the pivot language </LOC>

[0061] The type TEXT is the default type; if no other type is mentioned, the type of the element to be localized is the type TEXT by default.

[0062] The default text proposed is the one that can be used when a translation file is missing or when the content to be translated is absent from the translation file used for the target language.

[0063] The default text makes it possible to do without a translation file for the pivot language.

[0064] For the date type fields:

[0065] <LOC TYPE=DATE FORMAT=format>Date and/or time expressed in a neutral format (ex: AAAAMMJJHHMMSS) parameterizable </LOC>

[0066] The format specifies the meaning of the fields expressed in the value provided between the two tags. For example, to represent a time, the value of the FORMAT field is: “HHMM” or “HHMMSS”. This format doesn't have much to do with what will actually be displayed (for example: “19:28:30” or “19h28 m30s”), but it makes it possible to give a meaning to the value to be transcribed.

[0067] For the number type fields:

[0068] <LOC TYPE=NUM FORMAT=format>Number defined in a neutral format (ex: [+|−|]AAA[.BBB][e[+|−|]CCC]) parameterizable </LOC>

[0069] The format specifies the meaning of the fields expressed in the value provided between the two tags. For example, to represent an integer, the value of the FORMAT field is: “[+|−|]AAA”.

[0070] For the currency type fields:

[0071] <LOC TYPE=CUR>Number defined in a neutral format (ex: [+|−|]AAA[.BBB]) parameterizable </LOC>

[0072] For the image type fields (icons, etc.):

[0073] <LOC ID=message-identifier TYPE=NUM>Default path (optional), corresponding to the pivot language </LOC>

[0074] In the present example, the tags are not dedicated to a particular language. Their syntax, on the other hand, is that of the HTML and XML languages, thus making it possible to cover a large document folder. The method is applicable to the parent language of these two languages, i.e., the language SGML. The choice of the name of the tag must be configurable: it is essential that there not be any collision with tags that are already defined in the language in which the localization tags must be inserted.

[0075] The utilization of the keyword LOC as a tag identifier is proposed as an example. This keyword could be replaced by another keyword (LOCAL, LOCALIZATION, etc.) in other markup languages that are already using this keyword. The choice of this keyword should be made so that it is unique and unambiguous in the markup language used, and so that it is recognized by the various tools manipulating the markup pages (the page editors and the tool 11, in particular).

[0076] The presence of a unique identifier of content to be localized associated with each localization tag can be made optional for certain types of data. For digital type data, for example, it is possible to use automatic transcription functions, such as for example standard localization functions (provided in the form of programs, store in the storage means 12 of the tool 11) that make it possible to automate the reformatting of the information in a given language or culture from a pivot data format. For example, in the particular case of English-speaking cultures, these programs make it possible to receive as input a numeric value, and to produce as output a display representation that systematically includes a comma for separating the figures into thousands. They make it possible to automate certain translation tasks, and in particular the rendition of numeric values; they avoid a write operation in the translation files. On the other hand, the presence of a unique content identifier is mandatory for textual content, since it is the search key that will be used to find the localized message in the translation files. This key is justified by the fact that the translation of textual content cannot currently be automated in a completely reliable way, and hence, it is not possible to do without translation files specially formatted to reflect the content of the page to be translated.

[0077] Tags as defined above for localization have been inserted into the content of the page illustrated in Annex 1 in order to allow said page to be localized: the localized page appears in two different ways in Annexes 2 and 3. Thus, for example, the text message “My small business” is designated by the following localization tag:

[0078] <LOC ID=l>My small business. </LOC>

[0079] The localization attributes are the following: The identifier designated by the tag is “1 ”. The default text expressed in the pivot language, in this case English, is “My small business”. The type is not expressed; it is a default type, i.e., the type TEXT.

[0080] Annex 2 gives only text messages as examples, i.e. tags of the following type:

[0081] <LOC ID= . . . > . . . </LOC>

[0082] The content of the page in Annex 3 is simpler but requires the translation file, such as the English translation file in FIG. 4, that provides the value to be associated with each of the identifiers named in the document. In the example of Annex 3, the page cannot be directly displayed in the pivot language: it must move into the localization tool that allows it to be translated.

[0083] It is possible to define particular types for information sources not provided by the HTML language that are capable of being localized. For example, in certain countries or in certain working environments, a clanking sound is generated whenever an error occurs. The type of sound emitted when a particular even occurs differs depending on the countries and the customs of each. It is therefore possible to offer an additional type “SOUND” that makes it possible to handle this type of situation. Another example concerns the color conventions used by certain culture to express certain concepts or certain events: abundance or wealth may be represented by yellow or red, and mourning may be represented by black, white or red. It is possible to offer a “COLOR” tag that includes a “CONCEPT” attribute and a “VALUE” attribute for representing this type of situation.

[0084] The localization tool 11 detects the localization tags and the localization attributes associated with said tags, searches in the storage means 9 for the translation file 10 corresponding to the target language or culture, then searches in the selected file 10 for the localization attributes and the localization values associated with said unique localization attributes obtained in the reference file 8.

[0085] The method also consists of defining said translation file 10. The format of the translation file 10 is not very important: it depends on the tool 11 loaded to perform the localization of the documents. The translation file 10 includes one or more unique localization attributes, and in most cases, as seen above, a unique identifier or identifiers associated with a localized value that corresponds to the identifier for a given language. Annex 4 shows an example of a translation file 10 capable of being associated with the reference file 8 of Annexes 2 and 3 in order to display its content in English. The translation file 10 constitutes the content model; a content model is richer than a structure model: the content model specifies more than just the position of the titles and the paragraphs (which in general are designated by the “structure” of the document). The content model also indicates which information is to be provided, and within it, which information is to be localized (with the associated localization parameters).

[0086] In the example illustrated, the localization tool of the localization machine produces a web page from the reference file 8 and the translation file 10. To do this, the localization tool 11 takes the reference file 8 and replaces the localization tags of said reference file 8 with the localized values of the identifiers of said tags given by the appropriate translation file 10.

[0087] The tags can delimit messages that contain parameters; in general, the parameters are raw data to be displayed as is. The software programs must be able to handle this type of situation, such as for example error messages of the following type: “Error No. 1001: the file C:\COMMAND.COM does not exist.” This error message includes two parameters. The order of appearance of the parameters is important. It is not possible to divide this message, concatenating the following segments of it:

[0088] <LOC NUM=1>Error No. </LOC>

[0089] <LOC NUM=2>1001</LOC>

[0090] <LOC NUM=3>: the file </LOC>

[0091] <LOC NUM=4>C:\COMMAND.COM</LOC>

[0092] <LOC NUM=5>does not exist.</LOC>.

[0093] In fact, certain languages do not translate this message in the same way; they may not accept, or may delete, one of these five message, or even change the order of the parameters or messages. In English, for example, the message may be transformed in the following way:

[0094] <LOC NUM=1>Error No. </LOC>

[0095] <LOC NUM=2>1001</LOC>

[0096] <LOC NUM=3>C:\COMMAND.COM</LOC>

[0097] <LOC NUM=4>file does not exist.</LOC>

[0098] The method according to the invention therefore consists of numbering the parameters of the messages PARAM 1, PARAM2, etc., and of inserting labels, for example such as “%number”, into said messages. The invention handles the preceding case in the following way:

[0099] <LOC NUM=1 PARAM1=“1001” PARAM2=“C:\COMMAND.COM”>Error No. %1: the file %2 does not exist </LOC>

[0100] In the example of Annexes 2 and 3, the site Quincaillerie.com is a parameter since it does not change no matter what the language, the country or the culture. The tag numbers this first parameter: PARAM 1 and identifies it with the label %1. This information is stored in the reference file. It will be used by the localization tool loaded to re-read the reference file and the files containing the localized messages in order to constitute the final localized document.

[0101] It is common for portions of codes in HTML to be generated dynamically on the client end, in the browser, thanks to portions of code written in languages like JavaScript and embedded into the main HTML code.

[0102] In order to allow the dynamically generated HTML content to be localized, the method according to the invention consists of:

[0103] implementing the localization tool 11 in the dynamic code generation language, for example in JavaScript;

[0104] including the loading of the code of the corresponding JavaScript localization tool 11 in the main HTML web page (the one that generates HTML code on the fly in the client);

[0105] having the JavaScript localization tool 11 load the translation files 10 required for the localization of the HTML code generated in the client;

[0106] making use of the JavaScript localization tool 11 as the HTML code is generated.

[0107] Instead of using a JavaScript version of the localization tool 11, the designer of the Internet document can use CGIs (Common Gateway Interfaces). The CGI components are located in the server 2 and make it possible to execute actions, interrogate databases, etc. they are supposed to generate HTML pages; the CGI standard is an Internet standard. The CGI components are capable of performing the necessary localization operations, by sending this CGI a variable that gives an indication of the target language offered to the user.

[0108] The embodiments of the method according to the present invention are quite varied. According to one embodiment, which fits into the context of a software development process, the step for creating localized files is done “in the factory” prior to the storage of all the files of the application on CD-ROM, in which case the localized files are made available to the producer before the burning of the CD-ROM and the localized files are delivered (with or without the reference files) directly on the CD-ROM. This embodiment avoids the need to deliver, document, and maintain the localization tool 11 for third parties.

[0109] The localization tool 11 may be delivered (with the reference files 8) to third parties so that they themselves can expand the number of languages supported. This requires a documentation of the reference files 8 in order to facilitate the creation of translation files 10, and the establishment of a structure capable of responding to questions from these third parties.

[0110] Another embodiment consists of delivering the localization tool 11 and the reference files 8, the localized files being created upon installation of the software (which saves space on the CD-ROM of said software), on request (sometime after installation) or “on the fly” (i.e., during execution, as explained above in connection with JavaScript solutions embedded into web pages, or the use of CGI processes.

[0111] Another embodiment of the invention concerns MP3 CD-ROM readers into which XML files are written. The XML files contain information on the on the titles stored on the CD-ROM, the words associated with each of the titles, and the MP3 encoding of the titles in question. The XML files constitute reference files 8. When the CD-ROM reader reads the XML files, it is connected to a localization tool and translation files that make it possible to create localized XML files based on the country in which one is located.

[0112] The method according to the invention is also capable of being implemented in the Web page editor 12. The editor 12, each time a user enters content to be internationalized (a text, in particular), associates a unique identifier with said content, proposes the entry of a default value of the content to be internationalized, and proposes the entry of various values assumed by this content in the various target languages of the document being edited. The editor creates the reference file 8 and the associated translation files 10, and stores them in storage means 16. The editor offers:

[0113] ergonomic entry of these various messages for the various target languages of the document;

[0114] ease in storing and creating translation files 10 that are readable by translators that do not have the tool for editing the content to be localized;

[0115] easy creation of localized content from reference files 8 and translation files 10.

[0116] Unlike equivalent editors that might exist for creating particular documents, this editor would have to store localization attributes associated with the document element to be localized: the type of element localized (text, a number, a monetary value, an icon, a sound, a color, etc.), the parameters associated with this type, the parameters associated with the message to be localized (for messages that are partly fixed and partly variable).

[0117] One advantage of the present invention is the behavior of browsers faced with unknown tags, and in the present invention the tags dedicated to localization: as seen above, when the browser encounters such tags, it ignores them. Thus, the reference file containing the original page expressed in the pivot language may be used as is, without its being necessary to pass it through the localization tool 11 (with a few exceptions, particularly related to messages with parameters).

[0118] Another advantage is the existence of markup language syntax analyzers; they are numerous, space-efficient and very easy to use. The design of the localization tool could be based on an embodiment in existing syntax analyzers.

[0119] The present invention concerns the method for internationalizing the content of markup documents 8 that consists of:

[0120] detecting a tag dedicated to the localization of the document 8, the localization attribute or attributes, and possibly a default localization value associated with said tag by means of the localization tool 11;

[0121] searching, if necessary, in the storage means 9 in the translation file 10, for the localized value of the element associated with this or these localization attribute(s);

[0122] replacing the tag in the document 8 with the localized value found in the translation file 10, or with the default localization value, or with a value obtained via automatic transcription functions.

[0123] The method consists of searching for the type of the document 8 in order to recognize the tags used in said document and their grammar and syntax, and performing a detection of the tags dedicated to localization.

[0124] The method consists of using as localization attributes a unique identifier, an element type, and possibly parameters and/or specific attributes of the type.

[0125] The tag dedicated to localization assumes the formalism of a markup language.

[0126] The method consists of using tags that are not provided in the markup language used for localization purposes.

[0127] The method consists of creating, prior to the detection, the translation file 10 that includes the localization attribute or attributes of the element or elements to be localized, associated with the corresponding localized value of the localization attribute or attributes in a given language.

[0128] Prior to the detection, the localization tool 11 is implemented in a dynamic code generation language, and the code of the tool 11 is loaded into the document 8, which dynamically generates its own code, the replacement of the tags taking place as the code of the document 8 is generated dynamically.

[0129] The present invention relates to the system for implementing the method described above, characterized in that it includes the localization tool 11 and the means 9 for storing the translation file. The present invention also relates to a system for internationalizing the content of markup documents 8, comprising:

[0130] the means 7 for storing markup documents 8;

[0131] the means 9 for storing the translation files 10 of the documents 8; the localization tool 11 connected to said storage means 7, 9 and allowing the content of the document 8 to be localized using the translation file.

[0132] The localization tool 11 is implemented in a dynamic code generation language, and the code of the tool 11 is loaded into the document 8, which dynamically generates its own code.

[0133] The localization tool 11 is a CGI component.

[0134] The present invention concerns a method for editing and internationalizing markup documents 8 that consists, each time during the editing of the document 8 that a user enters content to be internationalized, of associating the localization attribute or attributes with said content, proposing the entry of a default value of the content to be internationalized, and proposing the entry of all or some of the various values assumed by this content in the various target languages of the document being edited, of creating the document 8 and the associated translation files 10 from information obtained from the user, and storing said files in the storage means 16.

[0135] The present invention concerns an editing and internationalization system comprising the editor 14 in the machine 15 for editing markup documents 8, which makes it possible to create reference files and associated translation files from information obtained from the user and store them in the storage means 16.

[0136] The present invention concerns the method for internationalizing the content of markup documents 8, which consists of:

[0137] Defining tags dedicated to localization;

[0138] Identifying the information to be localized in the document 8 by means of one or more localization attributes;

[0139] Associating the localization tags with the localization attributes in the document 8 in order to allow its localization. ANNEX 1 <HTML>  <HEAD>  <HEAD>  <TITLE>  Quincaillerie.com  </TITLE> <BODY>  <H1><CENTER>My small business...</CENTER></H1>  Vous &ecirc;tes sur le site <B>Quincaillerie.com</B><P>  [You are on the site <B>Quincaillerie.com</B><P>]  <H2>Small equipment</H2>  <UL>   <LI>Nuts</LI>   <LI>Bolts</LI>  <UL>  <H2>Household appliances</H2>  <UL>   <LI>Washing machine</LI>   <LI>Dishwasher</LI>  <UL>  <H2>My partners</H2>  <UL>   <LI><A HREF=“http://www.bullsoft.com”>BullSoft</A></LI>  <UL> ...  <BODY> <HTML>

[0140] ANNEX 2 <HTML> <HEAD> <HEAD> <TITLE> Quincaillerie.com </TITLE> <BODY> <H1><CENTER><LOC ID=1>My small business...</LOC></CENTER></H1> <LOC ID=2 PARAM1=“ <B>Quincaillerie.com</B>”>Vous &ecirc;tes sur le site de %1</LOC><P> [<LOC ID=2 PARAM1=“<B>Quincaillerie.com</B>”You are on the site %1</LOC><P>] <H2><LOC ID=3>Small equipment</LOC></H2> <UL> <LI><LOC ID=4>nuts</LOC></LI> <LI><LOC ID=5>Bolts</LOC></LI> </UL> <H2><LOC ID=6×Small equipment</LOC></H2> <UL> <LI><LOC ID=7>Washing machine</LOC></LI> <LI><LOC ID=8>Washing machine</LOC></LI> </UL> <H2><LOC ID=9>My partners</LOC></H2> <UL> <LI><A HREF=“http://www.bullsoft.com”>BullSoft</A></LI> </UL> ... <BODY> <HTML>

[0141] ANNEX 3 <HTML> =HEAD> <HEAD> <TITLE> Quincaillerie.com </TITLE> <BODY> <H1><CENTER><LOC ID=1> </LOC></CENTER><H1> <LOC ID=2 PARAM1=“<B>Quincaillerie.com</B>∞> </LOC><P> <H2><LOC ID=3> </LOC></H2> <UL> <LI><LO ID=4> </LOC></LI> <LI><LOC ID=5> </LOC></LI> </UL> <H2><LOC ID=6> </LOC></H2> <UL> <LI><LOC ID=7> </LOC></LI> <LI><LOC ID=8> </LOC></LI> </UL> <H2><LOC ID=9> </LOC></H2> <UL> <LI><A HREF=“http://www.bullsoft.com”>BullSoft</A></LI> </UL> ... <BODY> </HTML>

ANNEX 4

[0142] 1 My small business . . .

[0143] 2 You are on the site Quincaillerie.com

[0144] 3 Small equipment

[0145] 4 Household appliances

[0146] 5 . . . 

1. Method for internationalizing the content of markup documents (8), which consists of: detecting a tag dedicated to the localization of the document (8), one or more localization attributes, and possibly a default localization value associated with said tag by means of a localization tool (11); searching, if necessary, in storage means (9) in a translation file (10) for the localized value of the element associated with this or these localization attribute(s); replacing the tag in the document (8) with the localized value found in the translation file (10), or with the default localization value, or with a value obtained via automatic transcription functions.
 2. Method according to claim 1, characterized in that it consists of searching for the type of the document (8) in order to recognize the tags used in said document and their grammar and syntax, and performing a detection of the tags dedicated to localization.
 3. Method according to either of claims 1 and 2, characterized in that it consists of using as localization attributes a unique identifier, an element type, and possibly parameters and/or specific attributes of the type.
 4. Method according to any of claims 1 through 3, characterized in that the tag dedicated to location assumes the formalism of a markup language.
 5. Method according to claim 4, it consists of using tags that are not provided in the markup language, used for localization purposes.
 6. Method according to any of claims 1 through 5, characterized in that it consists of creating, prior to the detection, the translation file (10) that includes the localization attribute or attributes of the element or elements to be localized, associated with the corresponding localized value of the localization attribute or attributes in a given language.
 7. Method according to claim 1, characterized in that prior to the detection, the localization tool (11) is implemented in a dynamic code generation language, and the code of the tool (11) is loaded into the document (8), which dynamically generates its own code, the replacement of the tags taking place as the code of the document (8) is generated dynamically.
 8. System for implementing the method according to any of claims 1 through 7, characterized in that it includes a localization tool (11) and means (9) for storing a translation file.
 9. Method for internationalizing the content of markup documents (8), which consists of: means (7) for storing markup documents (8); means (9) for storing the translation files (10) of the documents (8); a localization tool (11) connected to said storage means (7, 9) and allowing the content of the document (8) to be localized using the translation file (10).
 10. System according to claim 9, characterized in that the localization tool (11) is implemented in a dynamic code generation language, and the code of the tool (11) is loaded into the document (8), which dynamically generates its own code.
 11. System according to claim 9, characterized in that the localization tool (11) is a CGI component.
 12. Method for editing and internationalizing markup documents (8) that consists, each time during the editing of the document (8) that a user enters content to be internationalized, of associating one or more localization attributes with said content, proposing the entry of a default value of the content to be internationalized, and proposing the entry of all or some of the various values assumed by this content in the various target languages of the document being edited, of creating the document (8) and associated translation files (10) from information obtained from the user, and storing said files in storage means (16).
 13. Editing and internationalization system comprising an editor (14) in a machine (15) for editing markup documents (8), which makes it possible to create reference files (8) and associated translation files (10) from information obtained from the user and store them in storage means (16). 